Hands-on Exercise 3

Programming Data Visualisation with R (Interactive vs Animated)

Author

Teo Suan Ern

Published

January 11, 2024

Modified

February 24, 2024

Note: First modified during in-class exercise 4 on Section 2.1.1 Tooltip effect with tooltip aesthetic. Last modified to include author’s details.

1. Getting Started

This exercise will cover the programming of both Interactive Data Visualisation and Animated Data Visualisation in two separate sections respectively.

1.1 Install and launch R packages

Getting Started (Interactive Data Visualisation)

For [2. Interactive Data Visualisation]

  • ggiraph for making ‘ggplot’ graphics interactive.

  • plotly, R library for plotting interactive statistical graphs.

  • DT provides an R interface to the JavaScript library DataTables that create interactive table on html page.

  • tidyverse, a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.

  • patchwork for combining multiple ggplot2 graphs into one figure.

Getting Started (Animated Data Visualisation)

For 3. Animated Data Visualisation

  • plotly, R library for plotting interactive statistical graphs.

  • gganimate, an ggplot extension for creating animated statistical graphs.

  • gifski converts video frames to GIF animations using pngquant’s fancy features for efficient cross-frame palettes and temporal dithering. It produces animated GIFs that use thousands of colors per frame.

  • gapminder: An excerpt of the data available at Gapminder.org. We just want to use its country_colors scheme.

  • tidyverse, a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.

Show code
pacman::p_load(ggiraph, plotly, 
               patchwork, DT, tidyverse,
               readxl, gifski, gapminder,
               plotly, gganimate)

1.2 Import the data

The code chunk below imports exam_data.csv into R environment by using read_csv() function of readr package.

  • readr is one of the tidyverse package.

  • readr package is used to import Exam_data.csv data file into R and save it as an tibble data frame called exam_data.

Show code
exam_data <- read_csv("data/Exam_data.csv")

Data worksheet from GlobalPopulation Excel workbook will be used.

Below is a code chunk to import Data worksheet from GlobalPopulation Excel workbook by using appropriate R package from tidyverse family.

Show code
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate_each_(funs(factor(.)), col) %>%
  mutate(Year = as.integer(Year))

mutate_each_() was deprecated in dplyr 0.7.0. and funs() was deprecated in dplyr 0.8.0. In view of this, we will re-write the code by using mutate_at() as shown in the code chunk below.

col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate_at(col, as.factor) %>%
  mutate(Year = as.integer(Year))

Instead of using mutate_at()across() can be used to derive the same outputs.

col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate(across(col, as.factor)) %>%
  mutate(Year = as.integer(Year))

1.3 Overview of the data

Show code
summary(exam_data)
      ID               CLASS              GENDER              RACE          
 Length:322         Length:322         Length:322         Length:322        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
    ENGLISH          MATHS          SCIENCE     
 Min.   :21.00   Min.   : 9.00   Min.   :15.00  
 1st Qu.:59.00   1st Qu.:58.00   1st Qu.:49.25  
 Median :70.00   Median :74.00   Median :65.00  
 Mean   :67.18   Mean   :69.33   Mean   :61.16  
 3rd Qu.:78.00   3rd Qu.:85.00   3rd Qu.:74.75  
 Max.   :96.00   Max.   :99.00   Max.   :96.00  
  • read_xls() of readxl package is used to import the Excel worksheet.

  • mutate_each_() of dplyr package is used to convert all character data type into factor.

  • mutate of dplyr package is used to convert data values of Year field into integer.

Show code
summary(globalPop)
        Country          Year          Young             Old       
 Afghanistan:  28   Min.   :1996   Min.   : 15.50   Min.   : 1.00  
 Albania    :  28   1st Qu.:2010   1st Qu.: 25.70   1st Qu.: 6.90  
 Algeria    :  28   Median :2024   Median : 34.30   Median :12.80  
 Andorra    :  28   Mean   :2023   Mean   : 41.66   Mean   :17.93  
 Angola     :  28   3rd Qu.:2038   3rd Qu.: 53.60   3rd Qu.:25.90  
 Anguilla   :  28   Max.   :2050   Max.   :109.20   Max.   :77.10  
 (Other)    :6036                                                  
   Population                Continent   
 Min.   :      3.3   Africa       :1568  
 1st Qu.:    605.9   Asia         :1454  
 Median :   5771.6   Europe       :1344  
 Mean   :  34860.9   North America: 976  
 3rd Qu.:  22711.0   Oceania      : 526  
 Max.   :1807878.6   South America: 336  
                                         

2. Interactive Data Visualisation

2.1 Working with ggiraph methods

ggiraph is an htmlwidget and a ggplot2 extension. It allows ggplot graphics to be interactive.

Interactive is made with ggplot geometries that can understand three arguments:

  • Tooltip: a column of data-sets that contain tooltips to be displayed when the mouse is over elements.

  • Onclick: a column of data-sets that contain a JavaScript function to be executed when elements are clicked.

  • Data_id: a column of data-sets that contain an id to be associated with elements.

If it used within a shiny application, elements associated with an id (data_id) can be selected and manipulated on client and server sides.

2.1.1 Tooltip effect with tooltip aesthetic

An interactive statistical graph uses ggiraph package, which consists of two parts:

  1. ggplot object will be created, and
  2. girafe() of ggiraph will be used to create an interactive Scalable Vector Graphics (SVG) object.
Show code
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = ID),
    stackgroups = TRUE, 
    binwidth = 1, 
    method = "histodot") +
  scale_y_continuous(NULL, 
                     breaks = NULL)
girafe(
  ggobj = p,
  width_svg = 6,
  height_svg = 6*0.618
)
Summary

There are two steps involved in plotting an interactive statistical graph:

  1. An interactive version of ggplot2 geom (i.e. geom_dotplot_interactive()) will be used to create the basic graph.
  2. girafe() will be used to generate an svg object to be displayed on an html page.

By hovering the mouse pointer on an data point of interest, the student’s ID will be displayed.

Displaying multiple information on tooltip

Show code
exam_data$tooltip <- c(paste0(     
  "Name = ", exam_data$ID,         
  "\n Class = ", exam_data$CLASS,
  "\n Race = ", exam_data$RACE)) 

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = exam_data$tooltip), 
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(
  ggobj = p,
  width_svg = 8,
  height_svg = 8*0.618
)
Summary
  • The first three lines of codes in the code chunk create a new field called tooltip.

  • The code populates text in ID, CLASS and RACE fields into the newly created field.

  • The newly created field is used as tooltip field as shown in the code of line 8.

Customising tooltip style

Code chunk below uses opts_tooltip() of ggiraph to customise tooltip rendering by add css declarations. Background colour of tooltip is black and the font colour is white and bold.

Show code
tooltip_css <- "background-color:white; #<<
font-style:bold; color:black;" #<<

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(tooltip = ID),                   
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(                                  
  ggobj = p,                             
  width_svg = 6,                         
  height_svg = 6*0.618,
  options = list(    #<<
    opts_tooltip(    #<<
      css = tooltip_css)) #<<
)

Displaying statistics on tooltip

Code chunk below shows an advanced way to customise tooltip. In this example, a function is used to compute 90% confident interval of the mean. The derived statistics are then displayed in the tooltip.

Show code
tooltip <- function(y, ymax, accuracy = .01) {
  mean <- scales::number(y, accuracy = accuracy)
  sem <- scales::number(ymax - y, accuracy = accuracy)
  paste("Mean maths scores:", mean, "+/-", sem)
}

gg_point <- ggplot(data=exam_data, 
                   aes(x = RACE),
) +
  stat_summary(aes(y = MATHS, 
                   tooltip = after_stat(  
                     tooltip(y, ymax))),  
    fun.data = "mean_se", 
    geom = GeomInteractiveCol,  
    fill = "light blue"
  ) +
  stat_summary(aes(y = MATHS),
    fun.data = mean_se,
    geom = "errorbar", width = 0.2, size = 0.2
  ) +
  ggtitle("Maths scores of Primary by Race")

girafe(ggobj = gg_point,
       width_svg = 8,
       height_svg = 8*0.618)

Tooltip effects

Code chunk below shows the second interactive feature of ggiraph, namely data_id. Elements associated with a data_id (i.e CLASS) will be highlighted upon mouse over. Note that the default value of the hover css is hover_css = “fill:orange;”.

Show code
exam_data$tooltip <- c(paste0(     
  "Name = ", exam_data$ID,         
  "\n Class = ", exam_data$CLASS)) 

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(           
    aes(data_id = CLASS, tooltip = exam_data$tooltip),             
    stackgroups = TRUE,               
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(                                  
  ggobj = p,                             
  width_svg = 6,                         
  height_svg = 6*0.618                      
)

Code chunk below shows the second interactive feature of ggiraph, namely data_id. Elements associated with a data_id (i.e CLASS) will be highlighted upon mouse over. Style tooltip by including opts_hover().

Show code
exam_data$tooltip <- c(paste0(     
  "Name = ", exam_data$ID,         
  "\n Class = ", exam_data$CLASS)) 
  
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(data_id = CLASS, tooltip = exam_data$tooltip),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(                                  
  ggobj = p,                             
  width_svg = 6,                         
  height_svg = 6*0.618,
  options = list(                        
    opts_hover(css = "fill: #202020;"),  
    opts_hover_inv(css = "opacity:0.2;") 
  )                                        
)

Elements associated with a data_id (i.e CLASS) will be highlighted upon mouse over. At the same time, the tooltip will show the CLASS.

Show code
exam_data$tooltip <- c(paste0(     
  "Name = ", exam_data$ID,         
  "\n Class = ", exam_data$CLASS)) 

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(tooltip = exam_data$tooltip, 
        data_id = CLASS),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(                                  
  ggobj = p,                             
  width_svg = 6,                         
  height_svg = 6*0.618,
  options = list(                        
    opts_hover(css = "fill: #202020;"),  
    opts_hover_inv(css = "opacity:0.2;") 
  )                                        
)                                        

onclick argument of ggiraph provides hotlink interactivity on the web. The code chunk below shown an example of onclick.

Web document link with a data object will be displayed on the web browser upon mouse click.

Show code
exam_data$onclick <- sprintf("window.open(\"%s%s\")",
"https://www.moe.gov.sg/schoolfinder?journey=Primary%20school",
as.character(exam_data$ID))

exam_data$tooltip <- c(paste0(     
  "Name = ", exam_data$ID,         
  "\n Class = ", exam_data$CLASS)) 

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(onclick = onclick, tooltip = exam_data$tooltip),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(                                  
  ggobj = p,                             
  width_svg = 6,                         
  height_svg = 6*0.618)

Note that click actions must be a string column in the dataset containing valid javascript instructions.

Coordinated Multiple Views with ggiraph

Notice that when a data point of one of the dotplot is selected, the corresponding data point ID on the second data visualisation will be highlighted too.

Steps to build a coordinate multiple view
  1. Appropriate interactive functions of ggiraph will be used to create the multiple views.

  2. patchwork function of patchwork package will be used inside girafe function to create the interactive coordinated multiple views.

Show code
p1 <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(data_id = ID),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +  
  coord_cartesian(xlim=c(0,100)) + 
  scale_y_continuous(NULL,               
                     breaks = NULL)

p2 <- ggplot(data=exam_data, 
       aes(x = ENGLISH)) +
  geom_dotplot_interactive(              
    aes(data_id = ID),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") + 
  coord_cartesian(xlim=c(0,100)) + 
  scale_y_continuous(NULL,               
                     breaks = NULL)

girafe(code = print(p1 + p2), 
       width_svg = 6,
       height_svg = 3,
       options = list(
         opts_hover(css = "fill: #202020;"),
         opts_hover_inv(css = "opacity:0.2;")
         )
       ) 

The data_id aesthetic is critical to link observations between plots and the tooltip aesthetic is optional but nice to have when mouse over a point.

2.2 Working with plotly methods

Plotly’s R graphing library create interactive web graphics from ggplot2 graphs and/or a custom interface to the (MIT-licensed) JavaScript library plotly.js inspired by the grammar of graphics. Different from other plotly platform, plot.R is free and open source.

There are two ways to create interactive graph by using plotly, they are:

  • by using plot_ly(), and

  • by using ggplotly()

2.2.1 Interactive scatterplot: plot_ly() method

A basic interactive plot created by using plot_ly().

Show code
plot_ly(data = exam_data, 
             x = ~MATHS, 
             y = ~ENGLISH)

color argument is mapped to a qualitative visual variable (i.e. RACE).

Show code
plot_ly(data = exam_data, 
        x = ~MATHS, 
        y = ~ENGLISH, 
        color = ~RACE)

2.2.2 Interactive scatterplot: ggplotly() method

The code chunk below plots an interactive scatter plot by using ggplotly().

Notice that the only extra line you need to include in the code chunk is ggplotly().

Show code
p <- ggplot(data=exam_data, 
            aes(x = MATHS,
                y = ENGLISH)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
ggplotly(p)

2.2.3 Coordinated Multiple Views with plotly

Click on a data point of one of the scatterplot and see how the corresponding point on the other scatterplot is selected.

Steps to build a coordinated linked plot
  1. highlight_key() of plotly package is used as shared data.

  2. Two scatterplots will be created by using ggplot2 functions.

  3. subplot() of plotly package is used to place them next to each other side-by-side.

Show code
d <- highlight_key(exam_data) # data to highlight

p1 <- ggplot(data=d, # for coordinated link view, note that data highlighted here is to d instead of exam_data
            aes(x = MATHS,
                y = ENGLISH)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))

p2 <- ggplot(data=d, # for coordinated link view, note that data highlighted here is to d instead of exam_data
            aes(x = MATHS,
                y = SCIENCE)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))

subplot(ggplotly(p1),  # to combine 2 ggplots together (note: this is diff from ggiraph)
        ggplotly(p2))
Learning Points

2.3 Working with crosstalk methods

Crosstalk is an add-on to the htmlwidgets package. It extends htmlwidgets with a set of classes, functions, and conventions for implementing cross-widget interactions (currently, linked brushing and filtering).

2.3.1 Interactive Data Table: DT package

  • A wrapper of the JavaScript Library DataTables

  • Data objects in R can be rendered as HTML tables using the JavaScript library ‘DataTables’ (typically via R Markdown or Shiny).

DT::datatable(exam_data, class= "compact")

2.3.2 Linked brushing: crosstalk method

d <- highlight_key(exam_data) 
p <- ggplot(d, 
            aes(ENGLISH, 
                MATHS)) + 
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))

gg <- highlight(ggplotly(p),        
                "plotly_selected")  

crosstalk::bscols(gg,               
                  DT::datatable(d), 
                  widths = 5)        
Learning Points
  • highlight() is a function of plotly package.

    • It sets a variety of options for brushing (i.e., highlighting) multiple plots designed mainly for linking multiple plotly graphs, and may not behave as expected when linking plotly to another htmlwidget package via crosstalk.
  • bscols() is a helper function of crosstalk package.

    • It enhances convenience to put HTML elements side by side. It can be called directly from the console but is especially designed to work in an R Markdown document. Warning: This will bring in all of Bootstrap!.

3. Animated Data Visualisation

Terminology

Key concepts and terminology related to visualisation:

  1. Frame: In an animated line graph, each frame represents a different point in time or a different category. When the frame changes, the data points on the graph are updated to reflect the new data.

  2. Animation Attributes: The animation attributes are the settings that control how the animation behaves. For example, you can specify the duration of each frame, the easing function used to transition between frames, and whether to start the animation from the current frame or from the beginning.

3.1 Working with gganimate methods

gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object in order to customise how it should change with time.

  • transition_*() defines how the data should be spread out and how it relates to itself across time.

  • view_*() defines how the positional scales should change along the animation.

  • shadow_*() defines how data from other points in time should be presented in the given point in time.

  • enter_*()/exit_*() defines how new data should appear and how old data should disappear during the course of the animation.

  • ease_aes() defines how different aesthetics should be eased during transitions.

  • transition_time() of gganimate is used to create transition through distinct states in time (i.e. Year).

  • ease_aes() is used to control easing of aesthetics. The default is linear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.

Show code
ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') +
  transition_time(Year) +       
  ease_aes('linear')

Show code
ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') 

3.2 Working with plotly

In Plotly R package, both ggplotly() and plot_ly() support key frame animations through the frame argument/aesthetic. They also support an ids argument/aesthetic to ensure smooth transitions between objects with the same id (which helps facilitate object constancy).

Learning Points
  • Appropriate ggplot2 functions are used to create a static bubble plot. The output is then saved as an R object called gg.

  • ggplotly() is then used to convert the R graphic object into an animated svg object.

Tip

Notice that although show.legend = FALSE argument was used, the legend still appears on the plot. To overcome this problem, theme(legend.position='none') should be used as shown in the plot and code chunk below.

Use plot_ly() method to create an animated bubble plot.

Show code
bp <- globalPop %>%
  plot_ly(x = ~Old, 
          y = ~Young, 
          size = ~Population, 
          color = ~Continent,
          sizes = c(2, 100),
          frame = ~Year, 
          text = ~Country, 
          hoverinfo = "text",
          type = 'scatter',
          mode = 'markers'
          ) %>%
  layout(showlegend = FALSE)
bp
Show code
gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young') + 
  theme(legend.position='none')

ggplotly(gg)

4. References

3  Programming Interactive Data Visualisation with R

4  Programming Animated Statistical Graphics with R

Back to top